For this session, you will need the following packages:
Install them by calling
install.packages(c(
"gapminder",
"gganimate",
"plotly",
"transformr"
))For gganimate to be able to produce animated images, you need to install a renderer. gifski is usually the best choice, but can be somewhat tricky to install depending on what operating system you are on. If you aren’t able to install gifski for some reason, you can also use ImageMagick.
First, simply try to run
install.packages("gifski")or
install.packages("magick")If any of the above works without a hitch, then you’re done! If not, try the advanced instructions below.
Try to install imagemagick by downloading and installing the recommended installation file at https://imagemagick.org/script/download.php#windows.
After this try to run install.packages("magick") again.
To install gifski on mac, first make sure you make homebrew installed. If you do not, open the macOS terminal and run
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"to install it. Afterwards, you call
brew install rustcand then install the gifski R package by running
install.packages("gifski")as before.
You can also try to install the Mac App Store application here, if the solution above does not work.
To install ImageMagick on macOS, you can do so using homebrew. First make sure you’ve installed homebrew (as outlined above) and then run
brew install imagemagickInstall gifski is easy using snap. Simply go to https://snapcraft.io/gifski for instructions for any given distribution.
Do you remember the animated plots we produced in the introductory presentation workshop based on the Gapminder Hans Rosling animated visualization?
In this worked example, we’ll work out how to reproduce that plot as both an animated an interactive visualization.
The dataset that we’ll use is available via the
gapminder package. Loadign
gapminder mThisakes the dataset directly available in an object called
gapminder. These are the first few rows of the dataset.
library(gapminder)
head(gapminder)## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
The variables should be self-explanatory.
Let’s jump right in! Create a bubble plot faceted on year (which we cut into groups), with population mapped to the size of the bubbles, and GDP per capita and life expectancy on the x and y axes respectively.
Here’s some code to get started:
library(tidyverse)
gapminder |>
mutate(years = cut_interval(year, length = 5)) |>
ggplot(...)Just as I said in the first presentation, this visualization is not (yet) working out so well for us. Let’s make it animated instead. For this, we’ll use the gganimate package.
Build the plot as before, but now make it animated by adding
the transition_time() function to the plot, mapping the animation to
year. Also use title = "Year: {frame_time}" in your labs() call
to animate the label, showing which year it is.
If you think the plot is still crowded, we could alternatively use
facets to separate continents. If you want to, you can make use of the
country_colors object that is included in the gapminder package by
adding the following line to your plot.
scale_colour_manual(values = country_colors, guide = FALSE)Try to add ease_aes("cubic-in-out") to the plot to change the
transition function and see what the difference is. There are
other options available if you check out the documentation for
ease_aes() too.
So far our plot does a good job of showing the trends among the various continents of the world but is hard to use if we are interested in one specific country. A remedy for this can be to use labels to let us identify which bubble belongs to which country. The large number of countries, however, means that it’s not a frightfully good idea to label all of them.
Instead, we’ll pick out the largest two countries (at the latest
time stamp) on each continent and label those. First, we
store the names of the countries in a vector, large_country_names.
The following steps first filter the dataset so that only
observations from the latest year (max(year)) are kept, then groups the
dataset by continent, then slices the dataset so that the observations
(countries) with the largest and next-to-largest values of population (pop)
of each group (continent) are kept, and then finally pulls out (using pull())
the country names.
large_country_names <-
gapminder |>
filter(year == max(year)) |>
group_by(continent) |>
slice_max(pop, n = 2) |>
pull(country)
large_country_names## [1] Nigeria Egypt United States Brazil China
## [6] India Germany Turkey Australia New Zealand
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe
Then we filter the original dataset to create a separate dataset for our labels.
large_countries <- filter(gapminder, country %in% large_country_names)Now it’s your turn to try to put everything together. Label the countries with
geom_label_repel() from the ggrepel package, in order to avoid
overlapping labels. Note that working with labels and animated visualizations
is something of a challenge. I had to tweak the settings (mostly
nudge_x and nudge_y) several times in order to get something that looks
good.
The final result should look something like the following figure.
Figure 1: Life expectancy and GDP per capita with countries. The two largest countries at the start (in terms of population) have been labeled.
Interactive visualizations are often effective, particularly when we want to visualize a complicated dataset such as this one. Here we’ll use the plotly package to do so, which, as you may recall from the lecture, works well in tandem with ggplot. First install the package.
install.packages("plotly")Then load the package.
library(plotly)Now we redraw the plot, adding an interactive slider to select the year using
plotly. Make note of the additional mapping that we’ve
added to geom_point(), namely frame, which is a special
mapping that will let plotly know which variable to use to separate
the visualization into frames.
p <- ggplot(gapminder, aes(gdpPercap, lifeExp)) +
geom_point(aes(frame = year), alpha = 0.5) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(x = "GDP per capita", y = "Life expectancy")
ggplotly(p)Figure 2: An interactive visualization using plotly for the Gapminder data.
Notice how seamless the conversion of ggplots into interactive plots can be with the help of plotly.
Try to modify the plot by adding additional dummy mapings in the aes() call
to the main ggplot function to be able to obtain information
on these variables in the tooltips too.